Serveur d'exploration MERS

Attention, ce site est en cours de développement !
Attention, site généré par des moyens informatiques à partir de corpus bruts.
Les informations ne sont donc pas validées.

Comparative studies of alignment, alignment-free and SVM based approaches for predicting the hosts of viruses based on viral sequences

Identifieur interne : 000A37 ( Main/Exploration ); précédent : 000A36; suivant : 000A38

Comparative studies of alignment, alignment-free and SVM based approaches for predicting the hosts of viruses based on viral sequences

Auteurs : Han Li [États-Unis] ; Fengzhu Sun [États-Unis, République populaire de Chine]

Source :

RBID : PMC:6030160

Descripteurs français

English descriptors

Abstract

Predicting the hosts of newly discovered viruses is important for pandemic surveillance of infectious diseases. We investigated the use of alignment-based and alignment-free methods and support vector machine using mononucleotide frequency and dinucleotide bias to predict the hosts of viruses, and applied these approaches to three datasets: rabies virus, coronavirus, and influenza A virus. For coronavirus, we used the spike gene sequences, while for rabies and influenza A viruses, we used the more conserved nucleoprotein gene sequences. We compared the three methods under different scenarios and showed that their performances are highly correlated with the variability of sequences and sample size. For conserved genes like the nucleoprotein gene, longer k-mers than mono- and dinucleotides are needed to better distinguish the sequences. We also showed that both alignment-based and alignment-free methods can accurately predict the hosts of viruses. When alignment is difficult to achieve or highly time-consuming, alignment-free methods can be a promising substitute to predict the hosts of new viruses.


Url:
DOI: 10.1038/s41598-018-28308-x
PubMed: 29968780
PubMed Central: 6030160


Affiliations:


Links toward previous steps (curation, corpus...)


Le document en format XML

<record>
<TEI>
<teiHeader>
<fileDesc>
<titleStmt>
<title xml:lang="en">Comparative studies of alignment, alignment-free and SVM based approaches for predicting the hosts of viruses based on viral sequences</title>
<author>
<name sortKey="Li, Han" sort="Li, Han" uniqKey="Li H" first="Han" last="Li">Han Li</name>
<affiliation wicri:level="2">
<nlm:aff id="Aff1">
<institution-wrap>
<institution-id institution-id-type="ISNI">0000 0001 2156 6853</institution-id>
<institution-id institution-id-type="GRID">grid.42505.36</institution-id>
<institution>Molecular and Computational Biology Program, Department of Biological Sciences,</institution>
<institution>University of Southern California,</institution>
</institution-wrap>
Los Angeles, CA 90089 USA</nlm:aff>
<country xml:lang="fr">États-Unis</country>
<placeName>
<region type="state">Californie</region>
</placeName>
<wicri:cityArea>Los Angeles</wicri:cityArea>
</affiliation>
</author>
<author>
<name sortKey="Sun, Fengzhu" sort="Sun, Fengzhu" uniqKey="Sun F" first="Fengzhu" last="Sun">Fengzhu Sun</name>
<affiliation wicri:level="2">
<nlm:aff id="Aff1">
<institution-wrap>
<institution-id institution-id-type="ISNI">0000 0001 2156 6853</institution-id>
<institution-id institution-id-type="GRID">grid.42505.36</institution-id>
<institution>Molecular and Computational Biology Program, Department of Biological Sciences,</institution>
<institution>University of Southern California,</institution>
</institution-wrap>
Los Angeles, CA 90089 USA</nlm:aff>
<country xml:lang="fr">États-Unis</country>
<placeName>
<region type="state">Californie</region>
</placeName>
<wicri:cityArea>Los Angeles</wicri:cityArea>
</affiliation>
<affiliation wicri:level="1">
<nlm:aff id="Aff2">
<institution-wrap>
<institution-id institution-id-type="ISNI">0000 0001 0125 2443</institution-id>
<institution-id institution-id-type="GRID">grid.8547.e</institution-id>
<institution>Centre for Computational Systems Biology, School of Mathematical Sciences,</institution>
<institution>Fudan University,</institution>
</institution-wrap>
Shanghai, 200433 China</nlm:aff>
<country xml:lang="fr">République populaire de Chine</country>
<wicri:regionArea>Shanghai</wicri:regionArea>
</affiliation>
</author>
</titleStmt>
<publicationStmt>
<idno type="wicri:source">PMC</idno>
<idno type="pmid">29968780</idno>
<idno type="pmc">6030160</idno>
<idno type="url">http://www.ncbi.nlm.nih.gov/pmc/articles/PMC6030160</idno>
<idno type="RBID">PMC:6030160</idno>
<idno type="doi">10.1038/s41598-018-28308-x</idno>
<date when="2018">2018</date>
<idno type="wicri:Area/Pmc/Corpus">000451</idno>
<idno type="wicri:explorRef" wicri:stream="Pmc" wicri:step="Corpus" wicri:corpus="PMC">000451</idno>
<idno type="wicri:Area/Pmc/Curation">000451</idno>
<idno type="wicri:explorRef" wicri:stream="Pmc" wicri:step="Curation">000451</idno>
<idno type="wicri:Area/Pmc/Checkpoint">000620</idno>
<idno type="wicri:explorRef" wicri:stream="Pmc" wicri:step="Checkpoint">000620</idno>
<idno type="wicri:source">PubMed</idno>
<idno type="RBID">pubmed:29968780</idno>
<idno type="wicri:Area/PubMed/Corpus">000848</idno>
<idno type="wicri:explorRef" wicri:stream="PubMed" wicri:step="Corpus" wicri:corpus="PubMed">000848</idno>
<idno type="wicri:Area/PubMed/Curation">000848</idno>
<idno type="wicri:explorRef" wicri:stream="PubMed" wicri:step="Curation">000848</idno>
<idno type="wicri:Area/PubMed/Checkpoint">000984</idno>
<idno type="wicri:explorRef" wicri:stream="Checkpoint" wicri:step="PubMed">000984</idno>
<idno type="wicri:Area/Ncbi/Merge">001E87</idno>
<idno type="wicri:Area/Ncbi/Curation">001E87</idno>
<idno type="wicri:Area/Ncbi/Checkpoint">001E87</idno>
<idno type="wicri:Area/Main/Merge">000A40</idno>
<idno type="wicri:Area/Main/Curation">000A37</idno>
<idno type="wicri:Area/Main/Exploration">000A37</idno>
</publicationStmt>
<sourceDesc>
<biblStruct>
<analytic>
<title xml:lang="en" level="a" type="main">Comparative studies of alignment, alignment-free and SVM based approaches for predicting the hosts of viruses based on viral sequences</title>
<author>
<name sortKey="Li, Han" sort="Li, Han" uniqKey="Li H" first="Han" last="Li">Han Li</name>
<affiliation wicri:level="2">
<nlm:aff id="Aff1">
<institution-wrap>
<institution-id institution-id-type="ISNI">0000 0001 2156 6853</institution-id>
<institution-id institution-id-type="GRID">grid.42505.36</institution-id>
<institution>Molecular and Computational Biology Program, Department of Biological Sciences,</institution>
<institution>University of Southern California,</institution>
</institution-wrap>
Los Angeles, CA 90089 USA</nlm:aff>
<country xml:lang="fr">États-Unis</country>
<placeName>
<region type="state">Californie</region>
</placeName>
<wicri:cityArea>Los Angeles</wicri:cityArea>
</affiliation>
</author>
<author>
<name sortKey="Sun, Fengzhu" sort="Sun, Fengzhu" uniqKey="Sun F" first="Fengzhu" last="Sun">Fengzhu Sun</name>
<affiliation wicri:level="2">
<nlm:aff id="Aff1">
<institution-wrap>
<institution-id institution-id-type="ISNI">0000 0001 2156 6853</institution-id>
<institution-id institution-id-type="GRID">grid.42505.36</institution-id>
<institution>Molecular and Computational Biology Program, Department of Biological Sciences,</institution>
<institution>University of Southern California,</institution>
</institution-wrap>
Los Angeles, CA 90089 USA</nlm:aff>
<country xml:lang="fr">États-Unis</country>
<placeName>
<region type="state">Californie</region>
</placeName>
<wicri:cityArea>Los Angeles</wicri:cityArea>
</affiliation>
<affiliation wicri:level="1">
<nlm:aff id="Aff2">
<institution-wrap>
<institution-id institution-id-type="ISNI">0000 0001 0125 2443</institution-id>
<institution-id institution-id-type="GRID">grid.8547.e</institution-id>
<institution>Centre for Computational Systems Biology, School of Mathematical Sciences,</institution>
<institution>Fudan University,</institution>
</institution-wrap>
Shanghai, 200433 China</nlm:aff>
<country xml:lang="fr">République populaire de Chine</country>
<wicri:regionArea>Shanghai</wicri:regionArea>
</affiliation>
</author>
</analytic>
<series>
<title level="j">Scientific Reports</title>
<idno type="eISSN">2045-2322</idno>
<imprint>
<date when="2018">2018</date>
</imprint>
</series>
</biblStruct>
</sourceDesc>
</fileDesc>
<profileDesc>
<textClass>
<keywords scheme="KwdEn" xml:lang="en">
<term>Coronavirus (genetics)</term>
<term>DNA, Viral (analysis)</term>
<term>Genome, Viral (genetics)</term>
<term>Host Microbial Interactions (genetics)</term>
<term>Host Microbial Interactions (physiology)</term>
<term>Influenza A virus (genetics)</term>
<term>Middle East Respiratory Syndrome Coronavirus (genetics)</term>
<term>Models, Theoretical</term>
<term>Pandemics</term>
<term>Phylogeny</term>
<term>Rabies virus (genetics)</term>
<term>Sequence Alignment (methods)</term>
<term>Sequence Analysis, DNA (methods)</term>
<term>Spike Glycoprotein, Coronavirus (genetics)</term>
<term>Support Vector Machine</term>
</keywords>
<keywords scheme="KwdFr" xml:lang="fr">
<term>ADN viral (analyse)</term>
<term>Alignement de séquences ()</term>
<term>Analyse de séquence d'ADN ()</term>
<term>Coronavirus (génétique)</term>
<term>Coronavirus du syndrome respiratoire du Moyen-Orient (génétique)</term>
<term>Glycoprotéine de spicule des coronavirus (génétique)</term>
<term>Génome viral (génétique)</term>
<term>Machine à vecteur de support</term>
<term>Modèles théoriques</term>
<term>Pandémies</term>
<term>Phylogénie</term>
<term>Virus de la grippe A (génétique)</term>
<term>Virus de la rage (génétique)</term>
</keywords>
<keywords scheme="MESH" type="chemical" qualifier="analysis" xml:lang="en">
<term>DNA, Viral</term>
</keywords>
<keywords scheme="MESH" qualifier="analyse" xml:lang="fr">
<term>ADN viral</term>
</keywords>
<keywords scheme="MESH" qualifier="genetics" xml:lang="en">
<term>Coronavirus</term>
<term>Genome, Viral</term>
<term>Host Microbial Interactions</term>
<term>Influenza A virus</term>
<term>Middle East Respiratory Syndrome Coronavirus</term>
<term>Rabies virus</term>
<term>Spike Glycoprotein, Coronavirus</term>
</keywords>
<keywords scheme="MESH" qualifier="génétique" xml:lang="fr">
<term>Coronavirus</term>
<term>Coronavirus du syndrome respiratoire du Moyen-Orient</term>
<term>Glycoprotéine de spicule des coronavirus</term>
<term>Génome viral</term>
<term>Virus de la grippe A</term>
<term>Virus de la rage</term>
</keywords>
<keywords scheme="MESH" qualifier="methods" xml:lang="en">
<term>Sequence Alignment</term>
<term>Sequence Analysis, DNA</term>
</keywords>
<keywords scheme="MESH" qualifier="physiology" xml:lang="en">
<term>Host Microbial Interactions</term>
</keywords>
<keywords scheme="MESH" xml:lang="en">
<term>Models, Theoretical</term>
<term>Pandemics</term>
<term>Phylogeny</term>
<term>Support Vector Machine</term>
</keywords>
<keywords scheme="MESH" xml:lang="fr">
<term>Alignement de séquences</term>
<term>Analyse de séquence d'ADN</term>
<term>Machine à vecteur de support</term>
<term>Modèles théoriques</term>
<term>Pandémies</term>
<term>Phylogénie</term>
</keywords>
</textClass>
</profileDesc>
</teiHeader>
<front>
<div type="abstract" xml:lang="en">
<p id="Par1">Predicting the hosts of newly discovered viruses is important for pandemic surveillance of infectious diseases. We investigated the use of alignment-based and alignment-free methods and support vector machine using mononucleotide frequency and dinucleotide bias to predict the hosts of viruses, and applied these approaches to three datasets: rabies virus, coronavirus, and influenza A virus. For coronavirus, we used the spike gene sequences, while for rabies and influenza A viruses, we used the more conserved nucleoprotein gene sequences. We compared the three methods under different scenarios and showed that their performances are highly correlated with the variability of sequences and sample size. For conserved genes like the nucleoprotein gene, longer
<italic>k</italic>
-mers than mono- and dinucleotides are needed to better distinguish the sequences. We also showed that both alignment-based and alignment-free methods can accurately predict the hosts of viruses. When alignment is difficult to achieve or highly time-consuming, alignment-free methods can be a promising substitute to predict the hosts of new viruses.</p>
</div>
</front>
<back>
<div1 type="bibliography">
<listBibl>
<biblStruct>
<analytic>
<author>
<name sortKey="Chan, Jfw" uniqKey="Chan J">JFW Chan</name>
</author>
<author>
<name sortKey="To, Kkw" uniqKey="To K">KKW To</name>
</author>
<author>
<name sortKey="Chen, H" uniqKey="Chen H">H Chen</name>
</author>
<author>
<name sortKey="Yuen, Ky" uniqKey="Yuen K">KY Yuen</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Longdon, B" uniqKey="Longdon B">B Longdon</name>
</author>
<author>
<name sortKey="Brockhurst, Ma" uniqKey="Brockhurst M">MA Brockhurst</name>
</author>
<author>
<name sortKey="Russell, Ca" uniqKey="Russell C">CA Russell</name>
</author>
<author>
<name sortKey="Welch, Jj" uniqKey="Welch J">JJ Welch</name>
</author>
<author>
<name sortKey="Jiggins, Fm" uniqKey="Jiggins F">FM Jiggins</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Lau, Sk" uniqKey="Lau S">SK Lau</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Li, W" uniqKey="Li W">W Li</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Streicker, Dg" uniqKey="Streicker D">DG Streicker</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Smith, Tf" uniqKey="Smith T">TF Smith</name>
</author>
<author>
<name sortKey="Waterman, Ms" uniqKey="Waterman M">MS Waterman</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Altschul, Sf" uniqKey="Altschul S">SF Altschul</name>
</author>
<author>
<name sortKey="Gish, W" uniqKey="Gish W">W Gish</name>
</author>
<author>
<name sortKey="Miller, W" uniqKey="Miller W">W Miller</name>
</author>
<author>
<name sortKey="Myers, Ew" uniqKey="Myers E">EW Myers</name>
</author>
<author>
<name sortKey="Lipman, Dj" uniqKey="Lipman D">DJ Lipman</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Kapoor, A" uniqKey="Kapoor A">A Kapoor</name>
</author>
<author>
<name sortKey="Simmonds, P" uniqKey="Simmonds P">P Simmonds</name>
</author>
<author>
<name sortKey="Lipkin, W" uniqKey="Lipkin W">W Lipkin</name>
</author>
<author>
<name sortKey="Zaidi, S" uniqKey="Zaidi S">S Zaidi</name>
</author>
<author>
<name sortKey="Delwart, E" uniqKey="Delwart E">E Delwart</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Aguas, R" uniqKey="Aguas R">R Aguas</name>
</author>
<author>
<name sortKey="Ferguson, Nm" uniqKey="Ferguson N">NM Ferguson</name>
</author>
</analytic>
</biblStruct>
<biblStruct></biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Kargarfard, F" uniqKey="Kargarfard F">F Kargarfard</name>
</author>
<author>
<name sortKey="Sami, A" uniqKey="Sami A">A Sami</name>
</author>
<author>
<name sortKey="Mohammadi Dehcheshmeh, M" uniqKey="Mohammadi Dehcheshmeh M">M Mohammadi-Dehcheshmeh</name>
</author>
<author>
<name sortKey="Ebrahimie, E" uniqKey="Ebrahimie E">E Ebrahimie</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Wan, L" uniqKey="Wan L">L Wan</name>
</author>
<author>
<name sortKey="Reinert, G" uniqKey="Reinert G">G Reinert</name>
</author>
<author>
<name sortKey="Sun, F" uniqKey="Sun F">F Sun</name>
</author>
<author>
<name sortKey="Waterman, Ms" uniqKey="Waterman M">MS Waterman</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Reinert, G" uniqKey="Reinert G">G Reinert</name>
</author>
<author>
<name sortKey="Chew, D" uniqKey="Chew D">D Chew</name>
</author>
<author>
<name sortKey="Sun, F" uniqKey="Sun F">F Sun</name>
</author>
<author>
<name sortKey="Waterman, Ms" uniqKey="Waterman M">MS Waterman</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Ren, J" uniqKey="Ren J">J Ren</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Lu, Yy" uniqKey="Lu Y">YY Lu</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Zhang, Cy" uniqKey="Zhang C">CY Zhang</name>
</author>
<author>
<name sortKey="Wei, Jf" uniqKey="Wei J">JF Wei</name>
</author>
<author>
<name sortKey="He, Sh" uniqKey="He S">SH He</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Holmes, Ec" uniqKey="Holmes E">EC Holmes</name>
</author>
<author>
<name sortKey="Woelk, Ch" uniqKey="Woelk C">CH Woelk</name>
</author>
<author>
<name sortKey="Kassis, R" uniqKey="Kassis R">R Kassis</name>
</author>
<author>
<name sortKey="Bourhy, H" uniqKey="Bourhy H">H Bourhy</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Gorman, Ot" uniqKey="Gorman O">OT Gorman</name>
</author>
<author>
<name sortKey="Bean, Wj" uniqKey="Bean W">WJ Bean</name>
</author>
<author>
<name sortKey="Kawaoka, Y" uniqKey="Kawaoka Y">Y Kawaoka</name>
</author>
<author>
<name sortKey="Webster, Rg" uniqKey="Webster R">RG Webster</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Zhang, Y" uniqKey="Zhang Y">Y Zhang</name>
</author>
</analytic>
</biblStruct>
<biblStruct></biblStruct>
<biblStruct></biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Qi, J" uniqKey="Qi J">J Qi</name>
</author>
<author>
<name sortKey="Luo, H" uniqKey="Luo H">H Luo</name>
</author>
<author>
<name sortKey="Hao, B" uniqKey="Hao B">B Hao</name>
</author>
</analytic>
</biblStruct>
<biblStruct></biblStruct>
<biblStruct></biblStruct>
<biblStruct></biblStruct>
</listBibl>
</div1>
</back>
</TEI>
<affiliations>
<list>
<country>
<li>République populaire de Chine</li>
<li>États-Unis</li>
</country>
<region>
<li>Californie</li>
</region>
</list>
<tree>
<country name="États-Unis">
<region name="Californie">
<name sortKey="Li, Han" sort="Li, Han" uniqKey="Li H" first="Han" last="Li">Han Li</name>
</region>
<name sortKey="Sun, Fengzhu" sort="Sun, Fengzhu" uniqKey="Sun F" first="Fengzhu" last="Sun">Fengzhu Sun</name>
</country>
<country name="République populaire de Chine">
<noRegion>
<name sortKey="Sun, Fengzhu" sort="Sun, Fengzhu" uniqKey="Sun F" first="Fengzhu" last="Sun">Fengzhu Sun</name>
</noRegion>
</country>
</tree>
</affiliations>
</record>

Pour manipuler ce document sous Unix (Dilib)

EXPLOR_STEP=$WICRI_ROOT/Sante/explor/MersV1/Data/Main/Exploration
HfdSelect -h $EXPLOR_STEP/biblio.hfd -nk 000A37 | SxmlIndent | more

Ou

HfdSelect -h $EXPLOR_AREA/Data/Main/Exploration/biblio.hfd -nk 000A37 | SxmlIndent | more

Pour mettre un lien sur cette page dans le réseau Wicri

{{Explor lien
   |wiki=    Sante
   |area=    MersV1
   |flux=    Main
   |étape=   Exploration
   |type=    RBID
   |clé=     PMC:6030160
   |texte=   Comparative studies of alignment, alignment-free and SVM based approaches for predicting the hosts of viruses based on viral sequences
}}

Pour générer des pages wiki

HfdIndexSelect -h $EXPLOR_AREA/Data/Main/Exploration/RBID.i   -Sk "pubmed:29968780" \
       | HfdSelect -Kh $EXPLOR_AREA/Data/Main/Exploration/biblio.hfd   \
       | NlmPubMed2Wicri -a MersV1 

Wicri

This area was generated with Dilib version V0.6.33.
Data generation: Mon Apr 20 23:26:43 2020. Site generation: Sat Mar 27 09:06:09 2021